Supervised learning using mahalanobis distance for record linkage
ثبت نشده
چکیده
In data privacy, record linkage is a well known technique used to evaluate the disclosure risk of protected data. Mainly, the idea is the linkage between records of different databases, which make reference to the same individuals. In this paper we introduce a new parametrized variation of record linkage relying on the Mahalanobis distance, and a supervised learning method to determine the optimum simulated covariance matrix for the linkage process. We evaluate and compare our proposal with other studied parametrized and not parametrized variations of record linkage, such as weighted mean or the Choquet integral, which determines the optimal fuzzy measure. URL http://agop2011.ciselab.org/proceedings [13] Source URL: https://www.iiia.csic.es/en/node/54955 Links [1] https://www.iiia.csic.es/en/staff/daniel-abril [2] https://www.iiia.csic.es/en/staff/guillermo-navarro-arribas [3] https://www.iiia.csic.es/en/staff/vicen%C3%A7-torra [4] https://www.iiia.csic.es/en/staff/bernard-de-baets [5] https://www.iiia.csic.es/en/bibliography?f[author]=1996 [6] https://www.iiia.csic.es/en/bibliography?f[author]=1997 [7] https://www.iiia.csic.es/en/bibliography?f[keyword]=936 [8] https://www.iiia.csic.es/en/bibliography?f[keyword]=497 [9] https://www.iiia.csic.es/en/bibliography?f[keyword]=465 [10] https://www.iiia.csic.es/en/bibliography?f[keyword]=935 [11] https://www.iiia.csic.es/en/bibliography?f[keyword]=934 [12] https://www.iiia.csic.es/en/bibliography?f[keyword]=470 [13] http://agop2011.ciselab.org/proceedings
منابع مشابه
Supervised learning approach for distance based record linkage as disclosure risk evaluation
In data privacy, record linkage is a well known technique to evaluate the disclosure risk of protected data. It is used to evaluate the number of linked records between a data set and its protected version. In this paper we give an overview of the work that we have been doing during the last months. We describe the development of a supervised learning method for distance-based record linkage, w...
متن کاملSupervised learning using mahalanobis distance for record linkage
In data privacy, record linkage is a well known technique used to evaluate the disclosure risk of protected data. Mainly, the idea is the linkage between records of different databases, which make reference to the same individuals. In this paper we introduce a new parametrized variation of record linkage relying on the Mahalanobis distance, and a supervised learning method to determine the opti...
متن کاملUsing Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment
Distance-based record linkage (DBRL) is a common approach to empirically assessing the disclosure risk in SDC-protected microdata. Usually, the Euclidean distance is used. In this paper, we explore the potential advantages of using the Mahalanobis distance for DBRL. We illustrate our point for partially synthetic microdata and show that, in some cases, Mahalanobis DBRL can yield a very high re-...
متن کاملLearnable Similarity Functions and Their Applications to Record Linkage and Clustering
Many machine learning tasks require similarity functions that estimate likeness between observations. Similarity computations are particularly important for clustering and record linkage algorithms that depend on accurate estimates of the distance between datapoints. However, standard measures such as string edit distance and Euclidean distance often fail to capture an appropriate notion of sim...
متن کاملChoquet integral for record linkage
Record linkage is used in data privacy to evaluate the disclosure risk of protected data. It models potential attacks, where an intruder attempts to link records from the protected data to the original data. In this paper we introduce a novel distance based record linkage, which uses the Choquet integral to compute the distance between records. We use a fuzzy measure to weight each subset of va...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017